On the Comparison of Line Spectral Frequencies and Mel-Frequency Cepstral Coefficients Using Feedforward Neural Network for Language Identification

نویسنده

Teddy Surya Gunawan

چکیده

Received Jan 3, 2018 Revised Mar 5, 2018 Accepted Mar 23, 2018 Of the many audio features available, this paper focuses on the comparison of two most popular features, i.e. line spectral frequencies (LSF) and MelFrequency Cepstral Coefficients. We trained a feedforward neural network with various hidden layers and number of hidden nodes to identify five different languages, i.e. Arabic, Chinese, English, Korean, and Malay. LSF, MFCC, and combination of both features were extracted as the feature vectors. Systematic experiments have been conducted to find the optimum parameters, i.e. sampling frequency, frame size, model order, and structure of neural network. The recognition rate per frame was converted to recognition rate per audio file using majority voting. On average, the recognition rate for LSF, MFCC, and combination of both features are 96%, 92%, and 96%, respectively. Therefore, LSF is the most suitable features to be utilized for language identification using feedforward neural network classifier.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification

Conventional Speaker Identification (SI) systems utilise spectral features like Mel-Frequency Cepstral Coefficients (MFCC) or Perceptual Linear Prediction (PLP) as a frontend module. Line Spectral pairs Frequencies (LSF) are popular alternative representation of Linear Prediction Coefficients (LPC). In this paper, an investigation is carried out to extract LSF from perceptually modified speech....

متن کامل

Combining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)

Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...

متن کامل

Significance of formants from difference spectrum for speaker identification

In this paper, we describe a prototype speaker identification system using auto-associative neural network (AANN) and formant features. Our experiments demonstrate that formants extracted from difference spectrum perform significantly better than formants extracted from normal spectrum for the task of speaker identification. We also demonstrate that formants from difference spectrum provide com...

متن کامل

Two Stage Neural Network model for Recognition of Indian Languages from Speech

India is a multilingual country. Officially about 20 languages are recognized by the government and there are about 500 languages spoken at different parts of the country. For developing the speech systems in Indian context, it is necessary to capture the language specific knowledge automatically from speech. Further it may be exploited for different speech tasks such as language identification...

متن کامل

Formant Estimation and Tracking Using Deep Learning

Formant frequency estimation and tracking are among the most fundamental problems in speech processing. In the former task the input is a stationary speech segment such as the middle part of a vowel and the goal is to estimate the formant frequencies, whereas in the latter task the input is a series of speech frames and the goal is to track the trajectory of the formant frequencies throughout t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

On the Comparison of Line Spectral Frequencies and Mel-Frequency Cepstral Coefficients Using Feedforward Neural Network for Language Identification

نویسنده

چکیده

منابع مشابه

On the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification

Combining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)

Significance of formants from difference spectrum for speaker identification

Two Stage Neural Network model for Recognition of Indian Languages from Speech

Formant Estimation and Tracking Using Deep Learning

عنوان ژورنال:

اشتراک گذاری